Method for detecting a threat and threat detecting apparatus

ABSTRACT

Aspects of the disclosure include a threat detecting apparatus. The threat detecting apparatus can include an interface circuit, an opcode detector, and a pattern analyzer. The interface circuit is configured to receive a data stream. The opcode detector can be configured to identify an opcode sequence embedded in the data stream based on a first model graph that includes a plurality of interconnected token nodes. Each token node is representative of an occurrence or a non-occurrence of a token. The pattern analyzer may be configured to identify an opcode signature embedded in the identified opcode sequence based on a second model graph, and to output a signal indicative of the successful identification of the opcode signature. The second model graph can include a plurality of interconnected opcode nodes, and each opcode node can be representative of an occurrence or a non-occurrence of a predetermined combination of one or more opcodes.

INCORPORATION BY REFERENCE

This present disclosure claims the benefit of U.S. application Ser. No.14/952,204, “Method for Detecting a Threat and Threat DetectingApparatus” filed on Nov. 25, 2015. This present disclosure is related toU.S. Pat. No. 8,701,162, “Method and System for Detecting and CounteringMalware in a Computer” and U.S. patent application Ser. No. 13/617,879,“Method and System for Classifying Vehicle Tracks,” filed on Sep. 14,2012. The entire disclosures of the prior applications are herebyincorporated by reference herein in their entirety.

BACKGROUND

Various types of threats to computers and networks, such as computerviruses, malware, ransomware, worms, trojan horses, rootkits,keyloggers, dialers, spyware, adware, rogue security software, or thelike, are designed to cause detrimental effects on a target machine. Insome applications, a threat detection device or program identifies thepresence of a threat in a data stream by comparing the binary signaturesof known threats with the data stream under inspection. A correspondingaction against the data stream may be taken upon the detection of thethreat. However, the threats are also being developed or evenself-modified to conceal themselves. Merely relying on the binarysignatures of known threats may not be sufficient to identify anewly-developed or newly-evolved threat.

SUMMARY

Aspects of the disclosure provide a threat detecting apparatus. Thethreat detecting apparatus includes an interface circuit, an opcodedetector, and a pattern analyzer. The interface circuit is configured toreceive a data stream. The opcode detector is configured to identify anopcode sequence embedded in the data stream based on a first modelgraph. The first model graph includes a plurality of interconnectedtoken nodes. Each token node of the interconnected token nodes isrepresentative of an occurrence or a non-occurrence of a token, and eachtoken is a predetermined combination of bits or bytes. The patternanalyzer is configured to identify an opcode signature embedded in theidentified opcode sequence based on a second model graph, and to outputa signal indicative of the successful identification of the opcodesignature. The second model graph includes a plurality of interconnectedopcode nodes, and each opcode node of the interconnected opcode nodes isrepresentative of an occurrence or a non-occurrence of a predeterminedcombination of one or more opcodes.

In an embodiment, the threat detecting apparatus includes a memorycircuit configured to store at least a portion of the first model graphor at least a portion of the second model graph. The memory circuit canbe configured to store a set of instructions, and the threat detectingapparatus may further include a processor configured to execute the setof instructions to function as the opcode detector or the patternanalyzer.

The threat detecting apparatus may include an application-specificintegrated circuit (ASIC) configured to function as the opcode detectoror the pattern analyzer. In an embodiment, at least a portion of thefirst model graph or a portion of the second model graph is hard-wiredin the ASIC.

Aspects of the disclosure provide a method for detecting a threat. Themethod includes receiving a data stream by an interface circuit,identifying an opcode sequence embedded in the data stream by an opcodedetector based on a first model graph, identifying an opcode signatureembedded in the identified opcode sequence by a pattern analyzer basedon a second model graph, and outputting a signal indicative of thesuccessful identification of the opcode signature by the patternanalyzer. The first model graph includes a plurality of interconnectedtoken nodes. Each token node of the interconnected token nodes isrepresentative of an occurrence or a non-occurrence of a token, and eachtoken is a predetermined combination of bits or bytes. The second modelgraph includes a plurality of interconnected opcode nodes, and eachopcode node of the interconnected opcode nodes is representative of anoccurrence or a non-occurrence of a predetermined combination of one ormore opcodes.

In an embodiment, identifying an opcode sequence embedded in the datastream includes traversing the first model graph in N process threadsbased on the data stream and N possible byte alignments, where N is aninteger greater than one.

In an embodiment, identifying an opcode sequence embedded in the datastream includes traversing the first model graph in one process threadbased on the data stream, where the first model graph incorporatesredundant paths based on N possible byte alignments of the data stream.

Aspects of the disclosure provide a threat detecting apparatus. Thethreat detecting apparatus includes a threat detection circuit that isconfigured to identify an opcode sequence embedded in a data streambased on a first model graph, identify an opcode signature embedded inthe identified opcode sequence based on a second model graph, and outputan indication signal indicative of the successful identification of theopcode signature. The first model graph includes a plurality ofinterconnected token nodes. Each token node of the interconnected tokennodes is representative of an occurrence or a non-occurrence of a token,and each token is a predetermined combination of bits or bytes. Thesecond model graph can include a plurality of interconnected opcodenodes, and each opcode node of the linked opcode nodes is representativeof an occurrence or a non-occurrence of a predetermined combination ofone or more opcodes.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of this disclosure that are proposed as exampleswill be described in detail with reference to the following figures,wherein like numerals reference like elements, and wherein:

FIG. 1 is a functional block diagram of an example threat detectingapparatus 100 coupled with a threat processor 180 according to anembodiment of the disclosure;

FIG. 2 is a flow chart outlining a process example 200 for detecting athreat according to an embodiment of the disclosure;

FIG. 3 is a flow chart outlining a more detailed exemplary process foridentifying an opcode sequence as described in process step S220according to an embodiment of the disclosure;

FIG. 4 is a graph diagram of an exemplary model graph 400 foridentifying an opcode sequence according to an embodiment of thedisclosure;

FIG. 5 is a flow chart outlining a more detailed exemplary process foridentifying an opcode pattern as described in process step S230according to an embodiment of the disclosure;

FIG. 6 is a graph diagram of an exemplary model graph 600 foridentifying an opcode pattern according to an embodiment of thedisclosure;

FIG. 7A is a system block diagram of an example threat detectingapparatus 100A according to an embodiment of the disclosure;

FIG. 7B is a system block diagram of another example threat detectingapparatus 100B according to an embodiment of the disclosure;

FIG. 8 is a flow chart outlining a process example 800 for generating amodel graph 400 for detecting an opcode sequence according to anembodiment of the disclosure; and

FIG. 9 is a flow chart outlining a process example 900 for generating amodel graph 600 for analyzing an opcode pattern according to anembodiment of the disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

The disclosed methods and systems below may be described generally, aswell as in terms of specific examples and/or specific embodiments. Forinstances where references are made to detailed examples and/orembodiments, it is noted that any of the underlying principles describedare not to be limited to a single embodiment, but may be expanded foruse with any of the other methods and systems described herein as willbe understood by one of ordinary skill in the art unless otherwisestated specifically.

Operation code or opcode generally refers to machine code usable toinstruct a processor to perform a predetermined operation. Suchpredetermined operations can include addition, shifting, moving, orcopying of a number, or changing a pointer indicating a position of ato-be-executed machine code to a predetermined position, and the like.Each processor or digital controller may be designed to respond to apredetermined set of machine codes, or sometimes being referred to asinstruction sets. Example machine codes include INTEL™ x86 instructionset and INTEL™ Streaming Single-instruction-multiple-data Extension(SSE) instruction set for INTEL™ x86 processors or other x86 compatibleprocessors; 32-bit ARM™ instruction set and 16-bit Thumb instruction setfor ARM™ processors or other ARM™ compatible processors; PowerPC ReducedInstruction Set Computing (RISC) instruction set for IBM™ PowerPC™processors; or scalable processor architecture RISC instruction set forSUN™ SPARC™ processors. Of course, it should be understood that theinvention can be applied to other existing or future machine codesequally as well.

FIG. 1 is a functional block diagram of an example threat detectingapparatus 100 coupled with a threat processor 180 according to anembodiment of the disclosure. As shown, the threat detecting apparatus100 can include a data interface 110, a threat detecting processor 120coupled with the data interface 110, and a memory circuit 130 coupledwith the threat detecting processor 120. The data interface 110 iscoupled with a data input port 112 and a data output port 116.

The threat detecting processor 120 can include an opcode detector 122coupled with the data interface 110 and a pattern analyzer 126 coupledwith the opcode detector 122. The opcode detector 122 and the patternanalyzer 126 are coupled with the memory circuit 130. A threat processor180, which is external to the threat detecting apparatus 100, can becoupled with both the data interface 110 and the threat detectingcircuit 120.

The data interface 110 is configured to receive a data stream from adata network or a computer or other data source (for example, a harddrive, a USB drive, etc.) via the data input port 112. The datainterface 110 is also configured to output the data stream to the threatdetecting processor 120 and to another data network or another computervia the data output port 116. The data interface 110 can include awireless network interface, such as BLUETOOTH, WIFI, WIMAX, LTE, GPRS,or WCDMA; or wired network interface such as ETHERNET, USB, orIEEE-1394.

The threat detecting processor 120 is configured to receive the datastream and to determine whether a threat is embedded in the receiveddata stream. In order for a threat to affect a target machine, thethreat may still need to cause the processor of the target machine toperform one or more operations that have detrimental effects to thetarget machine. Therefore, the threat detecting processor 120 detectsthe threat by identifying an opcode sequence embedded in the receiveddata stream and determining whether the identified opcode sequence, to astatistic significant level of confidence, corresponds to a behavioralpattern of a threat. The threat detecting processor 120 is thus capableof identifying a threat whose binary signature has not been seen before.

The opcode detector 122 is configured to identify an opcode sequenceembedded in the data stream. The pattern analyzer 126 is configured toidentify an opcode signature embedded in the identified opcode sequence,where the opcode signature corresponds to a known detrimental behavior,in other words a threat. The pattern analyzer 126 is further configuredto output an indication signal indicative of the successfulidentification of the threat. Therefore, the threat detecting processor120 can look for statistically-significant indications of a machine codeor a sequence of machine codes in the data stream by detecting an opcodesequence embedded in the data stream and an opcode signature embedded inthe opcode sequence. Once such an indication has been identified to apre-determined level of confidence, the threat detecting processor 120reports an analysis result to the threat processor 180.

The memory circuit 130 is configured to store at least a portion of themodel graph used by the opcode detector 122 or at least a portion of themodel graph used by the pattern analyzer 126. In another embodiment, themodel graph used by the opcode detector 122 or the model graph used bythe pattern analyzer 126 is hard-wired in the threat detecting processor120. The memory circuit 130 may store intermediate data generated by theopcode detector 122 or the pattern analyzer 126 while the opcodedetector 122 or the pattern analyzer 126 is analyzing the data stream.In some embodiments, memory circuit 130 is configured to store a set ofinstructions causing a processor to function as the opcode detector 122or the pattern analyzer 126.

The threat processor 180 is configured to receive the analysis resultfrom the threat detecting processor 120 and to take action accordinglyto handle the detected threat. In some embodiments, a proper action inresponse to a detected threat includes causing the data interface 110 todelete or quarantine the infected file, causing the data interface 110to stop outputting the data stream to the output port 116, continuingreceiving the data stream after the threat is detected for furtheranalysis, or performing another threat detecting/scanning operation toverify whether the detected threat is a false alarm. While threatprocessor 180 is not shown as part of the threat detecting apparatus 100in FIG. 1, it should be understood that in other embodiments the threatprocessor 180 may be implemented as part of the threat detectingapparatus 100.

Detailed operations of the threat detecting apparatus 100 will befurther described with reference to FIGS. 2-6.

FIG. 2 is a flow chart outlining a process example 200 for detecting athreat according to an embodiment of the disclosure. In an example, theprocess 200 is executed by a threat detecting apparatus, such as thethreat detecting apparatus 100. The process begins at S201 and proceedsto S210.

At S210, a data stream is received. For example, the threat detectingapparatus 100 receives the data stream via the data input port 112 ofthe data interface 110. The data interface 110 then transmits thereceived data stream to the opcode detector 122.

At S220, opcode sequences are identified in the data stream. Asdescribed above, the opcode detector 122 can determine the existence ofopcode sequences. Based on a first model graph the opcode detector 122can determine the existence of opcode sequences embedded in the datastream.

The first model graph includes a plurality of interconnected preamblenodes and token nodes. Each preamble node is a predetermined combinationof tokens. Each token node is representative of an occurrence or anon-occurrence of a corresponding token. Each token corresponds to apredetermined combination of bits or bytes. For example, each token maybe an 8-bit token, 16-bit token, or a 32-bit token. By dividing the datastream into tokens and traversing the first model graph, based on thetokens of the data stream, from a preamble node to an end node, theexistence of a corresponding opcode can be identified. In an example, atS220, at least a portion of the first model graph is retrieved from amemory circuit, such as the memory circuit 130.

In some examples, the received data stream may have various bytealignments or data structure alignments, and an opcode thus may begin atany position in the received data stream. To account for the variouspossible byte alignments or data structure alignments, the opcodedetector 122 can be a multi-hypothesis classifier that is usable toconsider all possible variations in a single pass of the data stream.For example, when the received data stream includes data bytes, in thechronological order, XX₁, XX₂, XX₃, XX₄, XX₅ . . . Because of variouspossible byte alignments, each byte of the received data stream could bethe beginning of an opcode. As such, the opcode detector 122 may beimplemented to process the received data stream with multiple hypothesesthat each data byte XX₁, XX₂, XX₃, XX₄, XX₅ could be the starting pointof a respective opcode.

In one embodiment, the opcode detector 122 may be configured to traversethe first model graph using N process threads based on N respective bytealignments, where N is an integer greater than one. For example, whenthe data stream XX₁, XX₂, XX₃, XX₄, and XX₅ has five possible bytealignments that each byte could be the first byte of an opcode, theopcode detector 122 can traverse the first model graph using fiveprocess threads each begins with a respective data byte XX₁, XX₂, XX₃,XX₄, and XX₅ as the first byte. In an embodiment, the N process threadscan be executed in parallel.

In another embodiment, the opcode detector 122 may also be configured totraverse the first model graph in a single process thread based on thedata stream, and the first model graph is constructed to incorporateredundant paths based on the N possible byte alignments of the datastream. In some embodiments, opcode detector 122 may be configured totraverse the first model graph using multiple process threads based onthe N₁ possible byte alignment variations, and first model graph foreach process thread is constructed to incorporate redundant paths basedon the N₂ possible byte alignment variations of the data stream.Accordingly, in this example, N₁×N₂ possible byte alignments of thereceived data stream are accounted for.

Detailed description for S220 is further described with reference toFIGS. 3 and 4.

At S230, an opcode pattern is identified. For example, the patternanalyzer 126 can determine the existence an opcode signature embedded inthe identified opcode sequence and identify the opcode signature basedon a second model graph. The second model graph includes a plurality ofinterconnected opcode nodes, and each opcode node of the interconnectedopcode nodes is representative of an occurrence or a non-occurrence of apredetermined combination of one or more opcodes. By traversing thesecond model graph, based on the opcodes in the identified opcodesequence, from a starting node to an end node, the existence of acorresponding threat can be identified.

In some embodiments, the pattern analyzer 126 is a state classifier thatis usable to identify a threat having a behavioral signature bytraversing the second model graph. An example state classifier isdescribed in U.S. patent application Ser. No. 13/617,879, the disclosureof which in incorporated herein by reference in its entirety. The secondmodel graph may be constructed to reflect a predetermined confidencelevel that a threat identified by the pattern analyzer 126 is not afalse alarm. In an example, at least a portion of the second model graphis retrieved from the memory circuit 130.

Detailed description for S230 is further described with reference toFIGS. 5 and 6.

At S240, the analysis result is reported. To do so, the pattern analyzer126 can output an analysis result, such as an indication signalindicative of the successful identification of a threat, to the threatprocessor. Then the process proceeds to S299 and terminates.

FIG. 3 is a flow chart outlining a more detailed exemplary process foridentifying an opcode sequence, as described above in process step S220in FIG. 2. In some embodiments, at step S220, multiple parallelprocessing threads may be executed for various possible byte alignments.The process described in step S220 can be executed by an opcodedetector, such as the opcode detector 122. In FIG. 3, the process beginsat S221 and proceeds to S310.

At S310, the first X tokens of the data stream are extracted, and adetermination is made as to whether the first X tokens match one of theplurality of preamble nodes in the first model graph. X is an integergreater than one. For example, the opcode detector 122 can extract thefirst X tokens of the data stream based on a predetermined bytealignment setting. The opcode detector 122 compares the first X tokenswith a plurality of preamble nodes in the first model graph andidentifies a matched one of the plurality of preamble nodes. If a matchis determined to exist, then the process proceeds to S320; otherwise theprocess proceeds to step S380.

At S320, a pointer used to trace the traversal of the first model graphis set to be indicative of the matched preamble node. In thisdisclosure, setting the pointer to be indicative of a particular node ina model graph is also described as moving the pointer to the particularnode. The node that pointer points at is also referred to as a currentnode in the present disclosure. In this step, the opcode detector 122can move the pointer to the matched preamble node. Other approaches totrace the traversal of the first model graph are within variouscontemplated embodiments of the present disclosure.

At S330, a determination is made as to whether the matched preamble nodeis a live preamble node that has one or more token nodes connectedthereto. When the matched preamble node is a live preamble node, theprocess proceeds to S340. When the matched preamble node is not a livepreamble node, the first X tokens do not lead to any opcode detectableaccording to the first model graph, and the process proceeds to S380. Inthis step, the opcode detector 122 can determine whether the matchedpreamble node is a live preamble node.

At S340, a next token of the data stream that has not yet beingreferenced for traversing the first model graph is extracted, and adetermination is made as to whether the next token matches a branchtoken node connected to the current node in the first model graph. Forexample, the opcode detector 122 may extract a next token of the datastream. When the pointer is at the matched preamble node, the next tokenis a token immediately after the first X tokens. The opcode detector 122may compare the next token with one or more branch token nodes connectedto the matched preamble node. When the pointer is at a token node, thenext token is a token immediately subsequent to the previous referencedtoken. If a match is determined to exist, then the process proceeds toS350; otherwise the process proceeds to step S380.

At S350, the pointer used to trace the traversal of the first modelgraph is moved to the matched token node. For example, the opcodedetector 122 can move the pointer to the matched branch token node.

At S360, a determination is made as to whether the current node is anend node such that the combination of tokens along the path the pointerhas traced corresponds to a predetermined opcode. For example, theopcode detector 122 can determine whether the current node is an endnode. When the current node is an end node, the process proceeds toS370. When the current node is not an end node, the combination oftokens along the path is not sufficient to support a conclusion that apredetermined opcode has been identified, and the process proceeds toS340 to continue traversing the first model graph based on the remainingportion of the data stream that has not yet being referenced fortraversing the model graph.

From S310 to S360, the pointer has been moved to traverse the modelgraph starting from a matched one of the plurality of preamble node,various branch token nodes based on the other tokens of the data streamin a sequential order, to an end node.

At S370, an opcode sequence record is updated to include an opcodecorresponding to the end node at which the pointer currently is. Forexample, the opcode detector 122 can update an opcode sequence recordbased on the opcode corresponding to the end node. In an embodiment, theopcode detector 122 is a multi-hypothesis classifier implemented byexecuting N parallel process threads for N different byte alignments ordata structure arrangements. The first model graph may be constructedsuch that if at S360 an invalid opcode is identified, the correspondingbyte alignment or data structure arrangement is determined to be notapplicable to the received data steam, and the respective process threadis thus terminated at S370.

At S380, a determination is made as to whether all tokens in the datastream have been referenced for traversing the model graph. For example,the opcode detector 122 may determine whether all tokens in the datastream that have been processed. When all tokens in the data stream havebeen referenced to traverse the model graph, the process proceeds toS229. When there are some tokens in the data stream that have not beenreferenced to traverse the model graph, the opcode detector 122 candiscard the processed tokens in the data stream and return to S310 tomove the pointer to traverse the model graph again based on theremaining tokens of the data stream in a sequential order. In anotherembodiment, at S380, the opcode detector 122 concludes the analysis ofthe data stream and proceeds to S229 without traversing the model graphfor a second round.

The process terminates at S229.

In some embodiments, the opcode detector 122 can be configured toprocess the data stream in a single pass without backing up. The opcodedetector 122 may use a well-developed model graph with built-inredundant paths to exhaust all possible variations of opcodes ofinterest and all possible byte alignments or data structure alignments.Meanwhile, the opcode detector 122 may identify a possible opcode byperforming a series of relatively simple memory comparison operations.Although such well-developed model graph may occupy a relatively largestorage space, the well-developed model graph may also simplify thecomputational complexity and improve the processing speed such that thereceived data stream can be processed in a single pass.

FIG. 4 is a graph diagram of an exemplary model graph 400 for detectingan opcode sequence according to an embodiment of the disclosure. Themodel graph 400 is usable as the first model graph used in conjunctionwith the process step S220 in FIG. 3.

The model graph 400 includes a preamble table 402 and a signature graph406. The preamble table 402 includes a plurality of preamble nodes eachcorresponding to a combination of three tokens. The signature graph 406includes a plurality of token nodes each is indicative of presence of apredetermined token. Some of the token nodes 411-415 are connected tocorresponding preamble nodes in the preamble table 402. Some of thetoken nodes 421-425 are end nodes (E₁˜E_(k)) that correspond toidentification of various opcodes. Each of the token nodes 421-425 aredepicted with a null pointing symbol 409 indicating the termination of acurrent process for traversing the model graph. Accordingly, acombination of tokens along each path from a preamble node to an endnode corresponds to successful identification of an opcode. In at leastone example, the model graph 400 has built-in redundancy paths that maylead to the same end node in order to take different byte alignments orplaceholder tokens into consideration. In some examples, the redundancypaths are implemented by splice paths 462 and 464 that connect one setof serially-linked token nodes to another set of serially-linked tokennodes.

Each token can have any one of “t” possible values. For illustrationpurposes, in the example in FIG. 4, the tokens are described as decimaldigits, where there are t=10 possible values for each token. Thoseskilled in the art will understand that tokens having a different numberof possible values may also be used. Each preamble node includes threetokens, and the preamble table thus includes up to 10³ differentpreamble nodes. In this example, preambles 431 and 433 are not livepreamble nodes because the process for traversing the model graph 400ends at the preamble node as indicated by the null pointing symbols 409.Preambles 431, 434, and 436 are live preamble nodes that lead todifferent end nodes through various paths.

In the example in FIG. 4, an opcode is composed of tokens “0033875249”is to be identified in two different scenarios.

In a first scenario, the data stream having tokens “0033875249 . . . ”is to be analyzed. The opcode detector 122 first extracts the firstthree tokens “003” and finds a matched preamble node 437 in the preambletable 402 (S310 and S320). Because the preamble node 437 is a livepreamble node, the opcode detector 122 moves on to extract a next token“3” and finds a matched token node 412 (steps S330 and S340). The opcodedetector 122 performs steps S340 to S360 recursively based on theremaining portion of the data stream “875249 . . . ” to move a pointerto traverse through token nodes 441, 442, 443, 444, and 445, and totoken node 422. The token node 422 is an end node E₂ that corresponds tothe opcode “0033875249,” and the opcode detector 122 updates an opcodesequence record to include the identified opcode “0033875249” (stepS370). The opcode detector 122 then discards the processed tokens“0033875249” from the data stream and traverse the model graph 400 againbased on the remaining portion of the data stream.

In a second scenario, the data stream having tokens “9991200338 75249 .. . ” is to be analyzed. The opcode detector 122 first extracts thefirst three tokens “999” and finds a matched preamble node 439 in thepreamble table 402 (steps S310 and S320). Because the preamble node 439is a live preamble node, the opcode detector 122 moves on to extract anext token “1” and finds a matched token node 415 (steps S330 and S340).The opcode detector 122 performs steps S340 and S360 recursively basedon the remaining portion of the data stream “120033 . . . ” to move apointer to traverse through token nodes 451, 452, 453, 454, and to tokennode 455. The next token “8” leads the pointer to be moved to token node441 through a splice path 462. The opcode detector 122 continuesperforming steps S340 and S360 recursively based on the remainingportion of the data stream “75249 . . . ” to move the pointer totraverse through token nodes 442, 443, 444, and 445, and to token node422. The token node 422 is an end node E₂ that corresponds to the opcode“0033875249,” and the opcode detector 122 updates an opcode sequencerecord to include the identified opcode “0033875249” (step S370). Theopcode detector 122 then discards the processed tokens “0033875249” fromthe data stream and traverse the model graph 400 again from the preambletable 402 based on the remaining portion of the data stream.

As demonstrated by the first scenario and the second scenario, variouspossible byte alignments have been accounted for using multiplehypotheses in the model graph 400. As such, even without the knowledgeof the exact byte alignment of the received data stream, the analysisthereof can still be processed in a single pass without backing up orrearranging the data stream for a second pass. Also, in someembodiments, the overall size of the model graph 400 can be reduced bymerging various possible token sequences with the introduction of thesplice paths.

The model graph 400 and the examples illustrated above are non-limitingexamples. Other model graph configuration and arrangement ofinterconnected opcode nodes are within various embodiments of thepresent disclosure.

FIG. 5 is a flow chart outlining a more detailed exemplary process foranalyzing an opcode pattern as described in process step S230 accordingto an embodiment of the disclosure. The process S230 is an exampleprocessing thread for implementing step S230 described above in FIG. 2.In an example, the process S230 can be executed by a pattern analyzer,such as the pattern analyzer 126 and the like. The process begins atS231 and proceeds to S510.

At S510, a determination is made as to whether there is at least oneunprocessed opcode. To do this, the pattern analyzer 126 may determinewhether there is at least one unprocessed opcode in the identifiedopcode sequence, or whether the opcode detector 122 did not identify anyopcode from the data stream. When the opcode detector 122 successfullyidentified an opcode sequence and the pattern analyzer 126 is processingthe first opcode in the identified opcode sequence, a pointer fortraversing a second model graph is set to a starting node of the secondmodel graph. When it is determined that there is at least oneunprocessed opcode, the process proceeds to S520. When it is determinedthat there is no opcode for further analysis, the process proceeds toS239. In some embodiments, the opcode detector 122 traverses the firstmodel graph in multiple process threads and may identify multiplecorresponding opcode sequences. Each of the multiple opcode sequencesmay be processed by the pattern analyzer 126 based on process 230 inseparate process threads.

At S520, an unprocessed opcode is obtained for traversing the secondmodel graph. The pattern analyzer 126 can extract a first opcode fromthe identified opcode sequence from the opcode detector 122 or a firstopcode in the identified opcode sequence that has not been referencedfor traversing the second model graph.

At S530, a determination is made as to whether there is a branch opcodenode connected to the current node at which the pointer is that matchesthe obtained opcode. Here, the pattern analyzer 126 may determinewhether a matched branch opcode node exists. When a matched branchopcode node exists, the process proceeds to S540. When no matched branchopcode node exists, the process proceeds to S580.

At S540, the pointer is moved to the matched branch opcode node. Forexample, the pattern analyzer 126 moves the pointer to the matchedbranch opcode node.

At S550, a determination is made as to whether the current opcode nodeis an end node such that the combination of opcodes along the path thepointer has traversed corresponds to a predetermined threat. Forexample, the pattern analyzer 126 can determine whether the currentbranch opcode node is an end node. When the current branch opcode nodeis an end node, the process proceeds to S560. When the current branchopcode node is not an end node, the combination of opcodes along thepath is not sufficient to support a conclusion that a predeterminedthreat has been identified, and the process proceeds to S510 to continuetraversing the second model graph based on the remaining portion of theopcode sequence that has not yet being referenced for traversing thesecond model graph.

From S510 to S550, the pointer has been moved to traverse the modelgraph from the starting node based on the opcodes in the identifiedopcode sequence in a sequential order to an end node.

Depending on the types of threats to be detected and the predeterminedconfidence level of the detection result, a successful traversal of thesecond model graph may be indicative of the quantity, nature, or acombination of the quantity and nature of one or more opcodes that islikely to correspond to a threat. In some embodiments, a threat asdefined by a successful traversal of the second model graph may includedetection of an opcode in a pure data stream, detection of apredetermined number of opcodes in a pure data stream, detection of anopcode corresponding to entering a protective mode, detection of a jumpopcode, detection of a branch opcode, or detection of a no operation(NOP) opcode. In some embodiments, a threat as defined by a successfultraversal of the second model graph may include a particular sequence ofopcodes that corresponds to a detrimental behavior pattern.

In at least one example, prior to S510, the identified sequence ofopcodes is checked for existence of jump opcodes, branch opcodes, orplaceholder opcodes. The identified sequence of opcodes may be reorderedaccording to the jump opcodes, branch opcodes, or placeholder opcodesfor further analysis based on S510 to S550.

At S560, the identification of the threat is reported. In an example,the pattern analyzer 126 reports the identification of the threatcorresponding to the end node at which the pointer currently is.

At S570, a determination is made as to whether to stop the analysis, Theanalysis may be stopped because all opcodes in the identified opcodesequence have been processed, or because there is no need to continueanalyzing the data stream after a threat is detected and reported. Forexample, the pattern analyzer 126 determines whether to stop theanalysis of the remaining unprocessed opcode(s). When it is determinedto continue analyzing the remaining unprocessed opcode(s), the processproceeds to S580. When it is determined to stop analyzing the datastream, the process proceeds to S239.

At S580, the pointer is reset to the starting node of the second modelgraph, and a starting position indicating a first opcode in theidentified opcode sequence to be analyzed in the next round oftraversing the second model graph is adjusted. The pattern analyzer 126can reset the pointer and adjust the starting position in the identifiedopcode sequence for the next round. The pattern analyzer 126 may set thestarting position at the first unprocessed opcode in the identifiedopcode sequence. In one example, the pattern analyzer 126 may set thestarting position at a processed opcode in the identified opcodesequence and marks all the subsequent opcodes as unprocessed opcodes.

The process terminates at S239.

FIG. 6 is a graph diagram of an exemplary model graph 600 for analyzingan opcode pattern according to an embodiment of the disclosure. Themodel graph 600 is usable as the model graph used in conjunction withthe process 230 in FIG. 5.

The model graph 600 includes a starting node 610 and a plurality ofinterconnected opcode nodes 622, 624, 626, 632, 634, 636, 638, 639, 642,644, 646, and 648. The downstream branch nodes for opcode nodes 626,632, and 636 are not depicted in FIG. 6. Each opcode node 622-648 isindicative of presence of a predetermined opcode OP₁, OP₂, OP₃, OP₄,OP₅, or OP₆. Opcode nodes 642-648 are end nodes. A combination ofopcodes along each path from the starting node 610 to an end node642-648 corresponds to successful identification of a predeterminedthreat. In FIG. 6, the presence of opcode OP₆ may immediately lead to anend node 646. The opcode OP₆ may correspond to an opcode that only anoperating system or a manufacture may use to cause the processor tooperate in a protective mode or a test mode. Therefore, in thisembodiment, the model graph 600 is configured to declare the presence ofopcode OP₆ in a data stream as a threat.

In at least one example, the model graph 600 has built-in redundancypaths that may lead to the same end node in order to take differentplaceholder or decoy opcodes into consideration.

In a first example according to FIG. 6, the identified opcode sequencehaving opcodes “OP₂, OP₄, OP₅, OP₁, . . . ” is to be analyzed. Thepattern analyzer 126 first determines that the identified opcodesequence is not an empty sequence (step S510) and extracts the firstopcode “OP₂” for processing (step S520). The pattern analyzer 126 thenfinds a matched branch opcode node 624 (step S530) and moves a pointerfrom the starting node 610 to the opcode node 624. The pattern analyzer126 determines that opcode node 624 is not an end node (step S550) andthus proceeds to analyze the next opcode. Based on the subsequentopcodes OP₄ and OP₅, the pattern analyzer 126 sets the pointer totraverse the model graph 600 through opcode node 634 to opcode node 644.

The pattern analyzer 126 determines that opcode node 644 is an end node(step S550) and reports to the threat processor 180 that a threatcorresponding to end node 644 has been detected (step S560). The patternanalyzer 126 may decide to analyze the remaining portion of the opcodesequence starting at opcode OP₁ or just stop analyzing the opcode.

In a second example according to FIG. 6, the identified opcode sequencehaving opcodes “OP₄, OP₂, OP₄, OP₅, . . . ” is to be analyzed. Thepattern analyzer 126 first determines that the identified opcodesequence is not an empty sequence (step S510) and extracts the firstopcode “OP₄” for processing (step S520). However, the pattern analyzer126 cannot find a matched branch opcode node after the starting node610. The pattern analyzer 126 thus drops the processed opcode “OP₄”, andcontinues processing the remaining portion of the opcode sequence “OP₂,OP₄, OP₅, . . . ” In the second iteration, the pattern analyzer 126traverses the model graph 600 from starting node 610 to end node 644through opcode nodes 624 and 634 as illustrated in the first example andreports that a threat corresponding to end node 644 has been detected.

In a third example according to FIG. 6, the identified opcode sequencehaving opcodes “OP₁, OP₂, OP₄, OP₅, . . . ” is to be analyzed. Thepattern analyzer 126 traverses the model graph 600 using opcodes “OP₁,OP₂, OP₄” and then cannot find a matched branch opcode node for opcode“OP₅” afterwards. The pattern analyzer 126 then adjusts the startingposition to the second opcode “OP₂” and traverses the model graph againand identified a threat corresponding to the opcode sequence of “OP₂,OP₄, OP₅” as illustrated in the first example. The pattern analyzer 126may adjust the starting position by rolling back a predetermined numberof opcodes. In an embodiment, the pattern analyzer 126 may adjust thestarting position to a predetermined number of opcodes after theprevious starting position. In another embodiment, the pattern analyzer126 may adjust the starting position to a predetermined opcode among theprocessed opcodes after the previous starting position.

The model graph 600 and the examples illustrated above are non-limitingexamples. Other model graph configuration and arrangement ofinterconnected opcode nodes are within various embodiments of thepresent disclosure.

FIG. 7A is a system block diagram of an example threat detectingapparatus 100A according to an embodiment of the disclosure. The threatdetecting apparatus 100A is an example implementation of the threatdetecting apparatus 100 in FIG. 1. Components in FIG. 7A that are thesame or similar to the components in FIG. 1 are given the same referencenumbers.

The threat detecting apparatus 100A includes a data interface 110, amemory circuit 130, and a processor 710. The memory circuit 130 isconfigured to store a set of instructions 132, at least a portion of thefirst model graph or at least a portion of the second model graph 134,and intermediate data 136 for performing the process as illustrated inFIGS. 2, 3, and 5. The processor 710 is configured to execute the set ofinstructions 132 to function as the opcode detector 122 and the patternanalyzer 126.

In some embodiments, the memory circuit 130 is an electronic, magnetic,optical, electromagnetic, infrared, and/or a semiconductor system (orapparatus or device). For example, the memory circuit 130 may include asemiconductor or solid-state memory, a magnetic tape, a removablecomputer diskette, a random access memory (RAM), a read-only memory(ROM), a rigid magnetic disk, and/or an optical disk. In someembodiments, the processor 710 is a central processing unit (CPU), amulti-processor, a distributed processing system, and/or a suitableprocessing unit.

FIG. 7B is a system block diagram of another example threat detectingapparatus 100B according to an embodiment of the disclosure. The threatdetecting apparatus 100B is another example implementation of the threatdetecting apparatus 100 in FIG. 1. Components in FIG. 7A that are thesame or similar to the components in FIG. 7A are given the samereference numbers.

The threat detecting apparatus 100B includes a data interface 110, amemory circuit 130, and an application specific integrated circuit(ASIC) 720. The memory circuit 130 is configured to store at least aportion of the first model graph or at least a portion of the secondmodel graph 134 and intermediate data 136 for performing the process asillustrated in FIGS. 2, 3, and 5. The ASIC 720 is configured to functionas the opcode detector 122 and the pattern analyzer 126. In one example,at least a portion of the first model graph or a portion of the secondmodel graph may be hard-wired in the ASIC 720.

In at least one example, the threat detecting apparatus 100 isimplemented by a combination of the processor 710 and the ASIC 720.

FIG. 8 is a flow chart outlining an exemplary process 800 for generatinga model graph, such as the model graph 400, for detecting an opcodesequence according to an embodiment of the disclosure. The model graph400 depicted in FIG. 4 will be referenced as a non-limiting example. Inan example, the process 800 is executed by a threat detecting apparatus,such as the threat detecting apparatus 100 and the like. In anotherexample, the process 800 is executed by a computer. The process beginsat S801 and proceeds to S810.

At S810, a new data signature that corresponds to a known opcode isobtained. The obtained data signature is in the form of a token string.The obtained data signature may include only the opcode itself orvariations of the opcode under different byte alignments.

At S820, the data signature is divided into a preamble and a body. Forexample, as depicted in FIG. 4, a data signature having a token string“0033875249” may be divided into a preamble “003” (preamble 437) and abody “3875249” (token nodes 412, 441-445, and 422).

At S830, a point of divergence between the body and the graph isidentified. For example, when the preamble node corresponding to thepreamble of the data signature has not yet connected to any token node,the point of divergence is at the preamble node. In another example,after a data signature having a token string “9991200339 . . . ” isadded to the model graph 400, a point of divergence of a data signaturehaving a token string “9991200338 75249” is at token node 455.

At S840, a point of merge between the body and the graph after the pointof divergence is identified. In some embodiments, the point of merge isidentified by comparing the data signatures that are variations of thesame opcode. For example, after a data signature having a token string“9991200339 . . . ” and a data signature having a token string“0033875249” are added to the model graph 400, a point of merge of adata signature having a token string “9991200338 75249” is at token node441. In some examples, there may not be an identifiable point of merge.

At S850, the body of the data signature is added or merged to the modelgraph 400 based on the identified point of divergence or point of merge.In the example discussed above, after a data signature having a tokenstring “9991200339 . . . ” and a data signature having a token string“0033875249” are added to the model graph 400, a point of divergence anda point of merge of a data signature having a token string “999120033875249” is at token node 455 and token node 441, respectively. To add ormerge token string “9991200338 75249” to model graph 400, a splice path462 is established pointing from token node 455 to token node 441. Inone embodiment when there is no identifiable point of merge, the tokennodes corresponding to the tokens after the point of divergence areestablished and linked according to the order of the token string.

At S860, a determination is made as to whether there is another datasignature to be processed and added to model graph 400. When there is noother data signature to be included in model graph 400, the processproceeds to S899 and terminates. When there is at least one new datasignature to be included in model graph 400, the process proceeds toS810.

FIG. 9 is a flow chart outlining an exemplary process 900 for generatinga model graph, such as the model graph 600, for analyzing an opcodepattern according to an embodiment of the disclosure. The model graph600 depicted in FIG. 6 will be referenced as a non-limiting example. Inan example, the process 900 is executed by a threat detecting apparatus,such as the threat detecting apparatus 100 and the like. In anotherexample, the process 900 is executed by a computer. The process beginsat S901 and proceeds to S910.

At S910, a new opcode sequence that corresponds to a known threat isobtained. The obtained opcode sequence may include only the minimumnumber of opcodes sufficient to identity the known threat or opcodesequence variations of the known threat having different decoy orredundant opcodes inserted therein.

At S920, a point of divergence between the opcode sequence and the graphis identified. For example, when the starting node 610 has not yetconnected to any opcode node, the point of divergence is at the startingnode. In another example, after an opcode sequence “OP₂, OP₅, . . . ” isadded to the model graph 600, a point of divergence of an opcodesequence “OP₂, OP₄, OP₅” is at opcode node 624.

At S930, a point of merge between the opcode sequence and the graphafter the point of divergence is identified. In some embodiments, thepoint of merge is identified by comparing the opcode sequences that arevariations of the same threat. In some examples, there may not be anidentifiable point of merge.

At S940, the opcode sequence is added or merged to the model graph 600based on the identified point of divergence or point of merge. In oneembodiment when there is no identifiable point of merge, the opcodenodes corresponding to the opcodes after the point of divergence areestablished and linked according to the order of the opcode sequence.

At S950, a determination is made as to whether there is another opcodesequence to be processed and added to model graph 600. When there is noother opcode sequence to be included in model graph 600, the processproceeds to S999 and terminates. When there is at least one new opcodesequence to be included in model graph 600, the process proceeds toS910.

While aspects of the present disclosure have been described inconjunction with the specific embodiments thereof that are proposed asexamples, alternatives, modifications, and variations to the examplesmay be made. Accordingly, embodiments as set forth herein are intendedto be illustrative and not limiting. There are changes that may be madewithout departing from the scope of the claims set forth below.

What is claimed is:
 1. A threat detecting apparatus, comprising: aninterface circuit configured to receive a data stream; an opcodedetector implemented by hardware circuitry and configured to identify anopcode sequence embedded in the data stream based on a first modelgraph, the first model graph including a plurality of interconnectedtoken nodes, each token node of the plurality of interconnected tokennodes being representative of an occurrence or a non-occurrence of atoken, and each token being a predetermined combination of bits orbytes; and a pattern analyzer implemented by the hardware circuitry andconfigured to identify an opcode signature embedded in the identifiedopcode sequence based on a second model graph, and to output a signalindicative of successful identification of the opcode signature, thesecond model graph including a plurality of interconnected opcode nodes,each opcode node of the plurality of interconnected opcode nodes beingrepresentative of an occurrence or a non-occurrence of a predeterminedcombination of one or more opcodes, wherein the opcode detector isconfigured to traverse the first model graph in N process threads basedon the data stream and N different byte alignments, or to traverse thefirst model graph, which incorporates redundant paths based on the Npossible byte alignments of the data stream, in one process thread basedon the data stream, where N is an integer greater than one.
 2. Thethreat detecting apparatus of claim 1, wherein the opcode detector isconfigured to: identify a matched one of a plurality of preamble nodesin the first model graph that matches first X tokens of the data stream,X being an integer greater than one; move a pointer to traverse thefirst model graph starting from the matched one of the plurality ofpreamble nodes based on the other tokens of the data stream in asequential order; and when the pointer reaches an end node of the firstmodel graph, update an opcode sequence record to include an opcodecorresponding to the end node.
 3. The threat detecting apparatus ofclaim 2, wherein the opcode detector is further configured to, afteridentifying an opcode, discard the processed tokens in the data streamand traverse the first model graph based on the remaining portion of thedata stream to identify a next opcode in a recursive manner.
 4. Thethreat detecting apparatus of claim 1, wherein the pattern analyzer isconfigured to: move a pointer to traverse the second model graph from astarting node of the second model graph based on opcodes of theidentified opcode sequence in a sequential order; and when the pointerreaches an end node of the second model graph, report identification ofan opcode signature corresponding to the reached end node.
 5. The threatdetecting apparatus of claim 1, further comprising: a memory circuitconfigured to store at least a portion of the first model graph or atleast a portion of the second model graph.
 6. The threat detectingapparatus of claim 5, wherein the memory circuit is configured to storea set of instructions, and the hardware circuitry of the threatdetecting apparatus comprises a processor configured to execute the setof instructions to function as the opcode detector or the patternanalyzer.
 7. The threat detecting apparatus of claim 1, wherein thehardware circuitry comprises: an application-specific integrated circuit(ASIC) configured to function as the opcode detector or the patternanalyzer.
 8. The threat detecting apparatus of claim 7, wherein at leasta portion of the first model graph or a portion of the second modelgraph is hard-wired in the ASIC.
 9. A method for detecting a threat,comprising: receiving a data stream by an interface circuit; identifyingan opcode sequence embedded in the data stream by an opcode detectorbased on a first model graph, the first model graph including aplurality of interconnected token nodes, each token node of theplurality of interconnected token nodes being representative of anoccurrence or a non-occurrence of a token, and each token being apredetermined combination of bits or bytes; identifying an opcodesignature embedded in the identified opcode sequence by a patternanalyzer based on a second model graph, the second model graph includinga plurality of interconnected opcode nodes, each opcode node of theplurality of interconnected opcode nodes being representative of anoccurrence or a non-occurrence of a predetermined combination of one ormore opcodes; and outputting a signal indicative of successfulidentification of the opcode signature by the pattern analyzer, whereinthe identifying the opcode sequence embedded in the data streamcomprises traversing the first model graph in N process threads based onthe data stream and N possible byte alignments, or traversing the firstmodel graph, which incorporates redundant paths based on the N possiblebyte alignments of the data stream, in one process thread based on thedata stream, where N is an integer greater than one.
 10. The method ofclaim 9, wherein identifying an opcode sequence embedded in the datastream comprises: identifying a matched one of a plurality of preamblenodes in the first model graph that matches first X tokens of the datastream, X being an integer greater than one; moving a pointer totraverse the first model graph starting from the matched one of theplurality of preamble node based on the other tokens of the data streamin a sequential order; and when the decision pointer reaches an end nodeof the first model graph, updating an opcode sequence record to includean opcode corresponding to the reached end node.
 11. The method of claim10, wherein identifying an opcode sequence embedded in the data streamfurther comprises: after identifying an opcode, discarding the processedtokens in the data stream and traversing the first model graph based onthe remaining portion of the data stream to identify a next opcode in arecursive manner.
 12. The method of claim 9, wherein identifying anopcode signature embedded in the identified opcode sequence comprises:moving a pointer to traverse the second model graph from a starting nodeof the second model graph based on opcodes of the identified opcodesequence in a sequential order; and when the pointer reaches an end nodeof the second model graph, report identification of an opcode signaturecorresponding to the reached end node.
 13. The method of claim 9,further comprising: retrieving a portion of the first model graph or aportion of the second model graph from a memory circuit.
 14. A threatdetecting apparatus, comprising: a threat detection circuit configuredto: identify an opcode sequence embedded in a data stream based on afirst model graph, the first model graph including a plurality ofinterconnected token nodes, each token node of the plurality ofinterconnected token nodes being representative of an occurrence or anon-occurrence of a token, and each token being a predeterminedcombination of bits or bytes; identify an opcode signature embedded inthe identified opcode sequence based on a second model graph, the secondmodel graph including a plurality of interconnected opcode nodes, eachopcode node of the plurality of interconnected opcode nodes beingrepresentative of an occurrence or a non-occurrence of a predeterminedcombination of one or more opcodes; and output an indication signalindicative of successful identification of the opcode signature, whereinthe threat detection circuit is configured to traverse the first modelgraph in N process threads based on the data stream and N different bytealignments, or to traverse the first model graph, which incorporatesredundant paths based on the N possible byte alignments of the datastream, in one process thread based on the data stream, where N is aninteger greater than one.
 15. The threat detecting apparatus of claim14, further comprising: a memory circuit configured to store a set ofinstructions and at least a portion of a first model graph or at least aportion of a second model graph, wherein the threat detection circuitcomprises a processor configured to execute the set of instructions tofunction as the threat detection circuit.
 16. The threat detectingapparatus of claim 14, wherein the threat detection circuit comprises:an application-specific integrated circuit (ASIC) configured to functionas the threat detection circuit.